1 Introduction

There is some confusion about how students are classified as full time and part time in these datasets. According to the Montgomery college website a full student is a student who attempts 12 or more credits. A part time student is defined as a student who attempts less than than 12 credits. I will know reclassify students according to these definitions and observe the differences in the distribution.

In this part of my project I will refine my research questions. I will further examine the effects of the pandemic on recent MCPS highschool graduates enrolled at Montgomery College. For the purposes of this study I will limit my dataset to MCPS students under the age of 20. These MCPS students will be divided further into subgroups based on Gender and Race. The datasets used in this part of my project have already been cleaned in my initial data analysis. Outliers have not been removed. I will conduct my statistical analysis with and without the outliers.

2 Data Dictionary

For the purposes of this Project the following variables and definitions are important.

The population in this dataset is the incoming cohort of students in Fall of 2019 and 2020. These students are first time degree or certificate seekers and have no prior tertiary education. They may have earned AP credits in highschool.

Fall2019 refers to the incoming freshman cohort in Fall2019. This is term year 2020.
Fall2020 refers to the incoming freshman cohort in Fall2020. This is term year 2021.

Variables of Interest: term year Incoming students in Fall2019 are assigned to term year 2020. Incoming students in Fall 2020 are assigned to term year 2021.
hours_earned: refers to credit hours the student has earned in their first Fall semester ( this can include credits earned in Summer school second session- Summer 1 and AP credits earned in high school).
hours_attempted: refers to credit and non credit hours the student has attempted in their first Fall semester ( this may include credits attempted in Summerschool second session - Summer 1).
full_part: is the student full-time (FT) or part-time (PT). Part time students are registered in less than 12 credit hours. Full-time students take at least 12 credits. major: degree programme student is registered for or certificate&LR ( letter of recommendation.) All certificates and letters of recommendations have been grouped together.
hours_earned_rate: Ratio of hours_earned/hours_attempted age: Age of student at start of program.
race: Racial classification of student. sex: Gender classification of student. high_school: Name of highschool student graduted from. Public High schools in Montgomery county are classified as MCPS. pell: Whether the student receives a pell grant or not.

3 Data Wrangling

3.1 Import Data

Summary of Data and Types

skim(df_Degrees)
Data summary
Name df_Degrees
Number of rows 7123
Number of columns 24
_______________________
Column type frequency:
character 15
logical 1
numeric 8
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
sex 0 1.00 1 1 0 4 0
race 0 1.00 5 22 0 9 0
age 0 1.00 4 7 0 5 0
high_school 0 1.00 7 30 0 163 0
full_part 0 1.00 2 2 0 2 0
city 19 1.00 5 19 0 127 0
stat_code 19 1.00 2 2 0 16 0
pell_grant 0 1.00 1 1 0 2 0
camp_code 140 0.98 1 1 0 6 0
major 0 1.00 1 61 0 34 0
pass_engl 0 1.00 1 1 0 2 0
pass_math 0 1.00 1 1 0 2 0
summer1 0 1.00 1 1 0 1 0
fall 0 1.00 1 1 0 1 0
HS_classify 0 1.00 2 14 0 7 0

Variable type: logical

skim_variable n_missing complete_rate mean count
MCPS 0 1 0.7 TRU: 4963, FAL: 2160

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
u_number 0 1 20196625.60 5027.06 20190001 20191872.50 20193733.00 20201703.5 20203588.0 ▇▃▁▂▇
zip 19 1 20886.64 1559.40 1460 20853.00 20877.00 20903.0 94025.0 ▁▇▁▁▁
hours_attempted 0 1 12.46 6.23 1 9.00 12.00 15.0 54.0 ▆▇▁▁▁
hours_earned 0 1 7.85 7.43 0 3.00 6.00 12.0 54.0 ▇▃▁▁▁
mc_gpa 0 1 2.19 1.47 0 0.67 2.50 3.5 4.0 ▆▂▃▅▇
term_year 0 1 2020.47 0.50 2020 2020.00 2020.00 2021.0 2021.0 ▇▁▁▁▇
hours_earned_rate 0 1 0.57 0.38 0 0.23 0.64 1.0 3.2 ▇▇▁▁▁
unearned_hours 0 1 4.61 4.24 -22 0.00 4.00 7.0 25.0 ▁▁▇▂▁

Change Datatypes

df_Degrees$u_number<- as.character(df_Degrees$u_number)
df_Degrees$term_year<- as.character(df_Degrees$term_year)

3.2 Create DataFrame of students who graduated MCPS high schools who are 20yrs and under .

Use the dataframe df_Degrees which has been cleaned in the initial data analysis. Filter all MCPS students who are 20yrs and younger in age.

df_MCPS20D<-df_Degrees %>%                    
         filter(HS_classify=="MCPS")%>%    # filter degrees dataset to obtain students who graduated MCPS highschools
         filter(age=='18 - 20' | age =="< 18") # filter students who are 20yrs old and younger. 
df_MCPS20D<-df_MCPS20D %>%                    
        select(.,-c("full_part"))
  
df_MCPS20D<-df_MCPS20D %>%    
         mutate(full_part = ifelse(hours_attempted<12,"PT","FT"))

4 Demographics of Students

who graduated from MCPS highschools and are 20yrs and younger.

4.1 Full time versus Part-time Degree Students

Frequency of Students Part time versus Full tim: 2020 vs 2021

# Number of students part time abnd full time  2020 vs 2021
ggplot(data=df_MCPS20D, aes(x=full_part, fill=full_part)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=2,size=3)+
      facet_wrap(~term_year)+
      ggtitle("Number of Students Full time versus Part time")+
      ylab('Frequency')+
      xlab("")+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

# change in overall MCPS student population from 2020 to 2021

df_MCPS20D%>%
          group_by(term_year,full_part)%>%
          count(full_part)%>%
          group_by(term_year)%>%
          mutate(total_pop =sum(n))%>%
          group_by(full_part)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 4 x 5
## # Groups:   full_part [2]
##   term_year full_part     n total_pop pct_change
##   <chr>     <chr>     <int>     <int>      <dbl>
## 1 2020      FT         1495      2456     NA    
## 2 2021      FT         1497      2303      0.134
## 3 2020      PT          961      2456     NA    
## 4 2021      PT          806      2303    -16.1

There was a 5.98% decrease in full time students who graduated from MCPS highschools in term year 2021. There was a -6.74% decrease in part time students who graduated from MCPS.

4.2 Race

Count of Race Groups

ggplot(data=df_MCPS20D, aes(x=race, fill=race)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0,size=3)+
      facet_wrap(~term_year + full_part)+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())+
      ggtitle("Number of Students per a Race Group")+
      xlab("Race")+
      ylab("Frequency")

Full time student: Change in enrollment from 2020 to 2021 based on Race

# calculate percentage change in full time student enrollment from 2020 to 2021 by  race

df_MCPS20D%>%
          filter(full_part=="FT")%>%
          group_by(term_year,race)%>%
          count(race)%>%
          group_by(race)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 18 x 4
## # Groups:   race [9]
##    term_year race                       n pct_change
##    <chr>     <chr>                  <int>      <dbl>
##  1 2020      Am. Indian / AK Native     4     NA    
##  2 2021      Am. Indian / AK Native     1    -75    
##  3 2020      Asian                    252     NA    
##  4 2021      Asian                    217    -13.9  
##  5 2020      Black / African Am.      341     NA    
##  6 2021      Black / African Am.      307     -9.97 
##  7 2020      Foreign                   98     NA    
##  8 2021      Foreign                   98      0    
##  9 2020      Hawaiian / Pac. Isl.       4     NA    
## 10 2021      Hawaiian / Pac. Isl.       3    -25    
## 11 2020      Hispanic                 482     NA    
## 12 2021      Hispanic                 569     18.0  
## 13 2020      Multi-Race                66     NA    
## 14 2021      Multi-Race                60     -9.09 
## 15 2020      Unknown                   10     NA    
## 16 2021      Unknown                    3    -70    
## 17 2020      White                    238     NA    
## 18 2021      White                    239      0.420

Full time students: There was a 16.5% decline in asian students, 16.1% decline in African American students, a 9.1% decline in white students and 6.8% decline in foreign students. Hispanic students increased by 11.6%.

Part time student: Change in enrollment from 2020 to 2021 based on Race

# calculate percentage change in full time student enrollment from 2020 to 2021 by  race

df_MCPS20D%>%
          filter(full_part=="PT")%>%
          group_by(term_year,race)%>%
          count(race)%>%
          group_by(race)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 18 x 4
## # Groups:   race [9]
##    term_year race                       n pct_change
##    <chr>     <chr>                  <int>      <dbl>
##  1 2020      Am. Indian / AK Native     5      NA   
##  2 2021      Am. Indian / AK Native     1     -80   
##  3 2020      Asian                     89      NA   
##  4 2021      Asian                     73     -18.0 
##  5 2020      Black / African Am.      225      NA   
##  6 2021      Black / African Am.      200     -11.1 
##  7 2020      Foreign                   78      NA   
##  8 2021      Foreign                   52     -33.3 
##  9 2020      Hawaiian / Pac. Isl.       2      NA   
## 10 2021      Hawaiian / Pac. Isl.       1     -50   
## 11 2020      Hispanic                 379      NA   
## 12 2021      Hispanic                 290     -23.5 
## 13 2020      Multi-Race                38      NA   
## 14 2021      Multi-Race                38       0   
## 15 2020      Unknown                    6      NA   
## 16 2021      Unknown                    2     -66.7 
## 17 2020      White                    139      NA   
## 18 2021      White                    149       7.19

Part time students: There was an 8.7% decrease in Asian students, a 26% decrease in foreign students, 2.3% increase in african american students and a 19.6% decrease in hispanic students. There was a 31.25% increase in white students.

4.3 Gender

Gender of Students

# Gender of students part time and full time  2020 vs 2021
ggplot(data=df_MCPS20D, aes(x=sex, fill=sex)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=1,size=3)+
      facet_wrap(~term_year+full_part)+
      ggtitle("Gender of Students: Full time versus Part time")+
      ylab('Frequency')+
      xlab("")+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

Calculate percentage change in full time student enrollment from 2020 to 2021 by gender

# calculate percentage change in full time student enrollment from 2020 to 2021 by  gender

df_MCPS20D%>%
          filter(full_part=="FT")%>%
          filter(sex=="F"|sex =="M")%>%
          group_by(term_year,sex)%>%
          count(sex)%>%
          group_by(sex)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 4 x 4
## # Groups:   sex [2]
##   term_year sex       n pct_change
##   <chr>     <chr> <int>      <dbl>
## 1 2020      F       732      NA   
## 2 2021      F       794       8.47
## 3 2020      M       745      NA   
## 4 2021      M       687      -7.79

Full time students: 14% decrease in attendance by male students. A 3.27% decrease in female students.

Calculate percentage change in part time student enrollment from 2020 to 2021 by gender

# calculate percentage change in part time student enrollment from 2020 to 2021 by  gender

df_MCPS20D%>%
          filter(full_part=="PT")%>%
          filter(sex=="F"|sex =="M")%>%
          group_by(term_year,sex)%>%
          count(sex)%>%
          group_by(sex)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 4 x 4
## # Groups:   sex [2]
##   term_year sex       n pct_change
##   <chr>     <chr> <int>      <dbl>
## 1 2020      F       442       NA  
## 2 2021      F       370      -16.3
## 3 2020      M       498       NA  
## 4 2021      M       427      -14.3

Part time: 9.5% decrease in female students. 1.5% decrease in male students.

Gender and Race breakdown of full time students

# Gender and Race of full time students  2020 vs 2021

df_MCPS20D%>%
      filter(sex %in% c("F","M"))%>%
      filter(full_part=="FT")%>%
      ggplot(., aes(x=race, fill=race)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, size=3)+
      facet_wrap(~term_year+sex)+
      ggtitle("Gender and Race of Full time Students")+
      ylab('Frequency')+
      xlab("")+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

#    theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

Full time Student Enrollment Percentages trend by Gender and race

# calculate percentage change in student enrollment from 2020 to 2021 by race and gender

# create data frames with counts of full time students by race and gender
df_MCPS20D%>%
          filter(full_part=="FT")%>%
          filter(sex=="F"|sex =="M")%>%
          group_by(term_year,race,sex)%>%
          count(sex)%>%
          group_by(race,sex)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 35 x 5
## # Groups:   race, sex [18]
##    term_year race                   sex       n pct_change
##    <chr>     <chr>                  <chr> <int>      <dbl>
##  1 2020      Am. Indian / AK Native F         3     NA    
##  2 2020      Am. Indian / AK Native M         1     NA    
##  3 2021      Am. Indian / AK Native M         1      0    
##  4 2020      Asian                  F       106     NA    
##  5 2021      Asian                  F       110      3.77 
##  6 2020      Asian                  M       144     NA    
##  7 2021      Asian                  M       105    -27.1  
##  8 2020      Black / African Am.    F       159     NA    
##  9 2021      Black / African Am.    F       160      0.629
## 10 2020      Black / African Am.    M       174     NA    
## # … with 25 more rows

Part time Student Enrollment Percentages trend by Gender and race

# calculate percentage change in student enrollment from 2020 to 2021 by race and gender

# create data frames with counts of full time students by race and gender
df_MCPS20D%>%
          filter(full_part=="PT")%>%
          filter(sex=="F"|sex =="M")%>%
          group_by(term_year,race,sex)%>%
          count(sex)%>%
          group_by(race,sex)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 33 x 5
## # Groups:   race, sex [18]
##    term_year race                   sex       n pct_change
##    <chr>     <chr>                  <chr> <int>      <dbl>
##  1 2020      Am. Indian / AK Native F         1      NA   
##  2 2020      Am. Indian / AK Native M         4      NA   
##  3 2021      Am. Indian / AK Native M         1     -75   
##  4 2020      Asian                  F        35      NA   
##  5 2021      Asian                  F        24     -31.4 
##  6 2020      Asian                  M        52      NA   
##  7 2021      Asian                  M        49      -5.77
##  8 2020      Black / African Am.    F        98      NA   
##  9 2021      Black / African Am.    F        93      -5.10
## 10 2020      Black / African Am.    M       124      NA   
## # … with 23 more rows

4.4 Majors

Overall Majors trend

Count of Majors in Full time students in 2020

z1<- df_MCPS20D%>%
      filter(full_part=="FT" &term_year =="2020")%>%
       ggplot(., aes(x=major, fill=major)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
      ggtitle("Majors of Full-time Students in 2020  ")+
      xlab("Major")+
      ylab("Frequency")+
    theme(legend.position = "none") 
       
z1 + coord_flip()

Count of Majors in Full time students in 2021

z13<- df_MCPS20D%>%
      filter(full_part=="FT" &term_year =="2021")%>%
       ggplot(., aes(x=major, fill=major)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
      ggtitle("Majors of Full-time Students in 2021  ")+
      xlab("Major")+
      ylab("Frequency")+
    theme(legend.position = "none") 
       
z13 + coord_flip()

calculate percentage change in full time student majors from 2020 to 2021

df_MCPS20D%>%
          filter(full_part=="FT")%>%
          group_by(term_year,major)%>%
          count(major)%>%
          group_by(term_year)%>%
          group_by(major)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 62 x 4
## # Groups:   major [33]
##    term_year major                        n pct_change
##    <chr>     <chr>                    <int>      <dbl>
##  1 2020      0                            3       NA  
##  2 2021      0                            2      -33.3
##  3 2020      American Sign Language       5       NA  
##  4 2021      American Sign Language       1      -80  
##  5 2020      Applied Geography            1       NA  
##  6 2021      Applied Geography            1        0  
##  7 2020      Architectural Technology    12       NA  
##  8 2021      Architectural Technology    16       33.3
##  9 2020      Art                         25       NA  
## 10 2021      Art                         22      -12  
## # … with 52 more rows

Count of Majors in Part time students in 2020

z11<- df_MCPS20D%>%
      filter(full_part=="PT" &term_year =="2020")%>%
       ggplot(., aes(x=major, fill=major)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
      ggtitle("Majors of Part-time Students in 2020  ")+
      xlab("Major")+
      ylab("Frequency")+
    theme(legend.position = "none") 
       
z11 + coord_flip()

Count of Majors in Part time students in 2021

z12<- df_MCPS20D%>%
      filter(full_part=="PT" &term_year =="2021")%>%
       ggplot(., aes(x=major, fill=major)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
      ggtitle("Majors of Part-time Students in 2021  ")+
      xlab("Major")+
      ylab("Frequency")+
    theme(legend.position = "none") 
       
z12 + coord_flip()

calculate percentage change in part time student majors from 2020 to 2021

df_MCPS20D%>%
          filter(full_part=="PT")%>%
          group_by(term_year,major)%>%
          count(major)%>%
          group_by(term_year)%>%
          group_by(major)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 60 x 4
## # Groups:   major [31]
##    term_year major                        n pct_change
##    <chr>     <chr>                    <int>      <dbl>
##  1 2020      0                            5       NA  
##  2 2020      American Sign Language       1       NA  
##  3 2021      American Sign Language       2      100  
##  4 2020      Applied Geography            2       NA  
##  5 2021      Applied Geography            1      -50  
##  6 2020      Architectural Technology    16       NA  
##  7 2021      Architectural Technology     7      -56.2
##  8 2020      Art                         11       NA  
##  9 2021      Art                         14       27.3
## 10 2020      Broadcast Media              5       NA  
## # … with 50 more rows

4.5 High Schools

4.5.1 Full time Student

Breakdown of Highschools Full time students in term year 2020 attended in MCPS

df_MCPS20D%>%
          filter(full_part=="FT" & term_year=="2020")%>%
          group_by(term_year,high_school)%>%
          count(high_school)%>%
          group_by(term_year)%>%
          mutate(total_pop =sum(n))%>%
          group_by(high_school)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_pop= (n/total_pop*100))%>%
          arrange(desc(pct_pop))
## # A tibble: 25 x 5
## # Groups:   high_school [25]
##    term_year high_school                        n total_pop pct_pop
##    <chr>     <chr>                          <int>     <int>   <dbl>
##  1 2020      Gaithersburg High School         114      1495    7.63
##  2 2020      Montgomery Blair High School     100      1495    6.69
##  3 2020      Northwest HS - Germantown         89      1495    5.95
##  4 2020      Paint Branch High School          85      1495    5.69
##  5 2020      Springbrook Sr High School        83      1495    5.55
##  6 2020      Colonel Zadok Magruder HS         72      1495    4.82
##  7 2020      Wheaton High School               72      1495    4.82
##  8 2020      Clarksburg High School            69      1495    4.62
##  9 2020      James Hubert Blake High School    69      1495    4.62
## 10 2020      Watkins Mill High School          69      1495    4.62
## # … with 15 more rows

Breakdown of Highschools Full time students in term year 2021 attended in MCPS

df_MCPS20D%>%
          filter(full_part=="FT" & term_year=="2021")%>%
          group_by(term_year,high_school)%>%
          count(high_school)%>%
          group_by(term_year)%>%
          mutate(total_pop =sum(n))%>%
          group_by(high_school)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_pop= (n/total_pop*100))%>%
          arrange(desc(pct_pop))
## # A tibble: 25 x 5
## # Groups:   high_school [25]
##    term_year high_school                        n total_pop pct_pop
##    <chr>     <chr>                          <int>     <int>   <dbl>
##  1 2021      Montgomery Blair High School      93      1497    6.21
##  2 2021      Wheaton High School               91      1497    6.08
##  3 2021      Paint Branch High School          84      1497    5.61
##  4 2021      Gaithersburg High School          83      1497    5.54
##  5 2021      Colonel Zadok Magruder HS         80      1497    5.34
##  6 2021      Northwest HS - Germantown         79      1497    5.28
##  7 2021      Richard Montgomery High School    75      1497    5.01
##  8 2021      Watkins Mill High School          74      1497    4.94
##  9 2021      Clarksburg High School            70      1497    4.68
## 10 2021      Sherwood High School              68      1497    4.54
## # … with 15 more rows
# calculate percentage change in full time student enrollment from 2020 to 2021 by MCPS highschool
df_MCPS20D%>%
          filter(full_part=="FT")%>%
          group_by(term_year,high_school)%>%
          count(high_school)%>%
          group_by(term_year)%>%
          group_by(high_school)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100)%>%
          arrange(desc(pct_change))
## # A tibble: 50 x 4
## # Groups:   high_school [25]
##    term_year high_school                        n pct_change
##    <chr>     <chr>                          <int>      <dbl>
##  1 2021      Walt Whitman High School          21      50   
##  2 2021      Rockville High School             66      46.7 
##  3 2021      Bethesda Chevy Chase High Schl    42      35.5 
##  4 2021      Sherwood High School              68      33.3 
##  5 2021      Seneca Valley High School         55      27.9 
##  6 2021      Wheaton High School               91      26.4 
##  7 2021      Thomas Sprigg Wootton High Sch    34      25.9 
##  8 2021      Richard Montgomery High School    75      17.2 
##  9 2021      Colonel Zadok Magruder HS         80      11.1 
## 10 2021      Watkins Mill High School          74       7.25
## # … with 40 more rows
v1<- df_MCPS20D %>% 
    group_by(term_year,full_part) %>% 
    filter(full_part=="FT" & term_year=="2020")%>%
    count(high_school) %>% 
    mutate(prop = n/sum(n)) %>% 
    ggplot(aes(x = high_school, y = prop)) +
    geom_col(aes(fill=high_school), position = "dodge") +
    geom_text(aes(label = scales::percent(prop,0.5), 
                  y = prop, 
                  group = high_school),
              position = position_dodge(width = 0.9),
              vjust = 0, size=3, hjust=0)+
  #  facet_wrap(~term_year )+
      ggtitle("High schools full time students graduated in term year 2020 graduated")+
      ylab('Proportion ')+
      xlab("")+
      theme(legend.position = "none", axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank()) 
  
v1+ coord_flip()  

v1<- df_MCPS20D %>% 
    group_by(term_year,full_part) %>% 
    filter(full_part=="FT" & term_year=="2021")%>%
    count(high_school) %>% 
    mutate(prop = n/sum(n)) %>% 
    ggplot(aes(x = high_school, y = prop)) +
    geom_col(aes(fill=high_school), position = "dodge") +
    geom_text(aes(label = scales::percent(prop,0.5), 
                  y = prop, 
                  group = high_school),
              position = position_dodge(width = 0.9),
              vjust = 0, size=3, hjust=0)+
  #  facet_wrap(~term_year )+
      ggtitle("High schools full time students graduated in term year 2021 graduated")+
      ylab('Proportion ')+
      xlab("")+
      theme(legend.position = "none", axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank()) 
  
v1+ coord_flip()  

4.5.2 Part time Student

Breakdown of Highschools Part time students in term year 2020 attended in MCPS

df_MCPS20D%>%
          filter(full_part=="PT" & term_year=="2020")%>%
          group_by(term_year,high_school)%>%
          count(high_school)%>%
          group_by(term_year)%>%
          mutate(total_pop =sum(n))%>%
          group_by(high_school)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_pop= (n/total_pop*100))%>%
          arrange(desc(pct_pop))
## # A tibble: 25 x 5
## # Groups:   high_school [25]
##    term_year high_school                        n total_pop pct_pop
##    <chr>     <chr>                          <int>     <int>   <dbl>
##  1 2020      Gaithersburg High School          73       961    7.60
##  2 2020      John F. Kennedy High School       66       961    6.87
##  3 2020      Northwest HS - Germantown         64       961    6.66
##  4 2020      Montgomery Blair High School      59       961    6.14
##  5 2020      Albert Einstein HS & MC Art Cn    50       961    5.20
##  6 2020      Clarksburg High School            50       961    5.20
##  7 2020      Richard Montgomery High School    50       961    5.20
##  8 2020      Paint Branch High School          48       961    4.99
##  9 2020      Wheaton High School               43       961    4.47
## 10 2020      Quince Orchard Sr High School     42       961    4.37
## # … with 15 more rows

Breakdown of Highschools Part time students in term year 2021 attended in MCPS

df_MCPS20D%>%
          filter(full_part=="PT" & term_year=="2021")%>%
          group_by(term_year,high_school)%>%
          count(high_school)%>%
          group_by(term_year)%>%
          mutate(total_pop =sum(n))%>%
          group_by(high_school)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_pop= (n/total_pop*100))%>%
          arrange(desc(pct_pop))
## # A tibble: 25 x 5
## # Groups:   high_school [25]
##    term_year high_school                        n total_pop pct_pop
##    <chr>     <chr>                          <int>     <int>   <dbl>
##  1 2021      Northwest HS - Germantown         55       806    6.82
##  2 2021      Gaithersburg High School          54       806    6.70
##  3 2021      Northwood High School             45       806    5.58
##  4 2021      Paint Branch High School          44       806    5.46
##  5 2021      Montgomery Blair High School      43       806    5.33
##  6 2021      Colonel Zadok Magruder HS         42       806    5.21
##  7 2021      Quince Orchard Sr High School     39       806    4.84
##  8 2021      Albert Einstein HS & MC Art Cn    38       806    4.71
##  9 2021      Richard Montgomery High School    36       806    4.47
## 10 2021      Clarksburg High School            35       806    4.34
## # … with 15 more rows
# calculate percentage change in full time student enrollment from 2020 to 2021 by MCPS highschool
df_MCPS20D%>%
          filter(full_part=="PT")%>%
          group_by(term_year,high_school)%>%
          count(high_school)%>%
          group_by(term_year)%>%
          group_by(high_school)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100)%>%
          arrange(desc(pct_change))
## # A tibble: 50 x 4
## # Groups:   high_school [25]
##    term_year high_school                        n pct_change
##    <chr>     <chr>                          <int>      <dbl>
##  1 2021      Northwood High School             45      45.2 
##  2 2021      Poolesville Jr-Sr High School     14      40   
##  3 2021      Walter Johnson High School        35      40   
##  4 2021      Thomas Sprigg Wootton High Sch    23      27.8 
##  5 2021      Colonel Zadok Magruder HS         42      16.7 
##  6 2021      Winston Churchill High School     14      16.7 
##  7 2021      James Hubert Blake High School    33       3.12
##  8 2021      Quince Orchard Sr High School     39      -7.14
##  9 2021      Paint Branch High School          44      -8.33
## 10 2021      Damascus High School              20      -9.09
## # … with 40 more rows
v3<- df_MCPS20D %>% 
    group_by(term_year,full_part) %>% 
    filter(full_part=="PT" & term_year=="2020")%>%
    count(high_school) %>% 
    mutate(prop = n/sum(n)) %>% 
    ggplot(aes(x = high_school, y = prop)) +
    geom_col(aes(fill=high_school), position = "dodge") +
    geom_text(aes(label = scales::percent(prop,0.5), 
                  y = prop, 
                  group = high_school),
              position = position_dodge(width = 0.9),
              vjust = 0, size=3, hjust=0)+
  #  facet_wrap(~term_year )+
      ggtitle("High schools Part time students graduated in term year 2020 graduated")+
      ylab('Proportion ')+
      xlab("")+
      theme(legend.position = "none", axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank()) 
  
v3+ coord_flip()  

v4<- df_MCPS20D %>% 
    group_by(term_year,full_part) %>% 
    filter(full_part=="PT" & term_year=="2021")%>%
    count(high_school) %>% 
    mutate(prop = n/sum(n)) %>% 
    ggplot(aes(x = high_school, y = prop)) +
    geom_col(aes(fill=high_school), position = "dodge") +
    geom_text(aes(label = scales::percent(prop,0.5), 
                  y = prop, 
                  group = high_school),
              position = position_dodge(width = 0.9),
              vjust = 0, size=3, hjust=0)+
  #  facet_wrap(~term_year )+
      ggtitle("High schools Part time students graduated in term year 2021 graduated")+
      ylab('Proportion ')+
      xlab("")+
      theme(legend.position = "none", axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank()) 
  
v4 + coord_flip()  

5 Statistical Analysis with Outliers

For the purposes of this analysis I will run the analysis first with outliers and then after removing outliers.

5.1 Hours Attempted

Boxplots of hours_attempted by year by MCPS students 20yrs and younger

p11 = ggplot(df_MCPS20D, aes(hours_attempted))
p11 + geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~full_part)

Students who register for more than 18 credits require special permission from the department. Further more a full time student is classified as someone who is enrolled in 12 or more credits. A part time student is classified as someone who is enrolled in less than 12 credits. However based on thge dataset, a number of full time students attempt less than 12 credits and large a number of part time students attempt more than 12 hours.

Boxplots of hours_attempted by year by Full time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_attempted))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Boxplots of hours_attempted by year by Part time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_attempted))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

There are not many outliers in the part time student groups. Term year 2021 seems to have more outliers on the upper end.

Density plot of hours_attempted by year

ggplot(df_MCPS20D, aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~full_part)+
  xlab("Hours attempted") +
  ylab( "Density")+
   ggtitle(" Hours Attempted by Full-time Students vs Part-time Students")

Hours attempted by full time students

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours attempted") +
  ylab( "Density") +
  ggtitle(" Hours Attempted by Full-time Students")

Fivenum Summary of Full time students

df_MCPS20D%>% filter(full_part=="FT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(hours_attempted)[1],
            Q1 = fivenum(hours_attempted)[2],
            median = fivenum(hours_attempted)[3],
            Q3 = fivenum(hours_attempted)[4],
            max = fivenum(hours_attempted)[5],
            mean= mean(hours_attempted),
            sd = sd(hours_attempted))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race               term_year     n   min    Q1 median    Q3   max  mean    sd
##    <chr>              <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Am. Indian / AK N… 2020          4    13  15       18  27.5    36  21.2 10.1 
##  2 Am. Indian / AK N… 2021          1    13  13       13  13      13  13   NA   
##  3 Asian              2020        252    12  13       15  21      52  18.5  8.13
##  4 Asian              2021        217    12  13       15  18      46  17.3  6.89
##  5 Black / African A… 2020        341    12  12       13  15      42  14.2  3.38
##  6 Black / African A… 2021        307    12  13       14  16      38  15.4  4.05
##  7 Foreign            2020         98    12  13       14  18      31  15.6  3.98
##  8 Foreign            2021         98    12  13       15  16      37  16.1  5.26
##  9 Hawaiian / Pac. I… 2020          4    12  12.5     13  14      15  13.2  1.26
## 10 Hawaiian / Pac. I… 2021          3    12  15.5     19  24.5    30  20.3  9.07
## 11 Hispanic           2020        482    12  12       13  16      39  15.0  4.21
## 12 Hispanic           2021        569    12  13       14  16      43  15.6  4.36
## 13 Multi-Race         2020         66    12  12       14  18      44  17.3  8.02
## 14 Multi-Race         2021         60    12  12.5     14  17      43  16.2  6.27
## 15 Unknown            2020         10    12  12       14  15      31  15.6  5.72
## 16 Unknown            2021          3    12  12       12  13      14  12.7  1.15
## 17 White              2020        238    12  12       14  18      46  16.9  7.17
## 18 White              2021        239    12  13       15  18      54  17.0  6.50

Hours attempted by part time students

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours attempted") +
  ylab( "Density")+
   ggtitle(" Hours Attempted by Part-time Students")

Fivenum Summary of Part time students

df_MCPS20D%>% filter(full_part=="PT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(hours_attempted)[1],
            Q1 = fivenum(hours_attempted)[2],
            median = fivenum(hours_attempted)[3],
            Q3 = fivenum(hours_attempted)[4],
            max = fivenum(hours_attempted)[5],
            mean= mean(hours_attempted),
            sd = sd(hours_attempted))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race               term_year     n   min    Q1 median    Q3   max  mean    sd
##    <chr>              <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Am. Indian / AK N… 2020          5     3     3    6       8     9  5.8   2.77
##  2 Am. Indian / AK N… 2021          1     6     6    6       6     6  6    NA   
##  3 Asian              2020         89     2     7    9      10    11  8.26  2.57
##  4 Asian              2021         73     3     8    9      10    11  8.45  2.47
##  5 Black / African A… 2020        225     1     6    9      10    11  7.78  2.56
##  6 Black / African A… 2021        200     1     6    8      10    11  7.61  2.69
##  7 Foreign            2020         78     3     6    8      10    11  7.58  2.46
##  8 Foreign            2021         52     3     5    9      10    11  7.71  2.71
##  9 Hawaiian / Pac. I… 2020          2     6     6    7.5     9     9  7.5   2.12
## 10 Hawaiian / Pac. I… 2021          1     5     5    5       5     5  5    NA   
## 11 Hispanic           2020        379     1     6    8      10    11  7.75  2.41
## 12 Hispanic           2021        290     1     6    9      10    11  8.11  2.52
## 13 Multi-Race         2020         38     1     5    9       9    11  7.26  2.87
## 14 Multi-Race         2021         38     3     6    9      10    11  8     2.45
## 15 Unknown            2020          6     7     9    9.5    10    10  9.17  1.17
## 16 Unknown            2021          2     4     4    6.5     9     9  6.5   3.54
## 17 White              2020        139     1     6    9      10    11  7.94  2.65
## 18 White              2021        149     3     5    8      10    11  7.60  2.71

5.2 Hours Earned

Boxplots of Hours Earned by year by MCPS students 20yrs and younger

p11 = ggplot(df_MCPS20D, aes(hours_earned))
p11 + geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~full_part)

Boxplots of hours_earned by year by Full time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_earned))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Boxplots of hours_earned by year by Part time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_earned))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

There are not many outliers in the part time student groups. Term year 2021 seems to have more outliers on the upper end.

Density plot of hours_earned by year

ggplot(df_MCPS20D, aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~full_part)+
  xlab("Hours Earned") +
  ylab( "Density")+
  ggtitle(" Hours Earned by Full-time vs Part-time Students")

Hours_earned by full time students

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours Earned") +
  ylab( "Density")+
   ggtitle(" Hours Earned by Full-time Students")

Fivenum Summary of Full time students

df_MCPS20D%>% filter(full_part=="FT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(hours_earned)[1],
            Q1 = fivenum(hours_earned)[2],
            median = fivenum(hours_earned)[3],
            Q3 = fivenum(hours_earned)[4],
            max = fivenum(hours_earned)[5],
            mean= mean(hours_earned),
            sd = sd(hours_earned))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race               term_year     n   min    Q1 median    Q3   max  mean    sd
##    <chr>              <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Am. Indian / AK N… 2020          4    10  12     16.5  27.5    36 19.8  11.4 
##  2 Am. Indian / AK N… 2021          1    13  13     13    13      13 13    NA   
##  3 Asian              2020        252     0  10     13    18      52 15.6   9.18
##  4 Asian              2021        217     0   9     13    16      46 14.0   8.47
##  5 Black / African A… 2020        341     0   6      9    12      42  9.44  5.57
##  6 Black / African A… 2021        307     0   6      9    13      37 10.0   6.57
##  7 Foreign            2020         98     0   6     11    14      31 10.9   6.71
##  8 Foreign            2021         98     0   6     10    13      37 10.7   7.87
##  9 Hawaiian / Pac. I… 2020          4     0   4.5   10.5  12.5    13  8.5   5.92
## 10 Hawaiian / Pac. I… 2021          3     9  12.5   16    23      30 18.3  10.7 
## 11 Hispanic           2020        482     0   6     10    13      38 10.3   6.57
## 12 Hispanic           2021        569     0   6     11    13      42 10.5   6.49
## 13 Multi-Race         2020         66     0   7     12    15      44 13.9   9.82
## 14 Multi-Race         2021         60     0   6     11    15      43 11.6   8.79
## 15 Unknown            2020         10     3   4      9.5  14      31 11     8.18
## 16 Unknown            2021          3     3   5      7     9.5    12  7.33  4.51
## 17 White              2020        238     0   7     12    16      46 13.3   9.09
## 18 White              2021        239     0   9     12    16      54 13.0   8.29

hours_earned by part time students

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours Earned") +
  ylab( "Density")+
   ggtitle(" Hours Earned by Part-time Students")

Fivenum Summary of Part time students

df_MCPS20D%>% filter(full_part=="PT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(hours_earned)[1],
            Q1 = fivenum(hours_earned)[2],
            median = fivenum(hours_earned)[3],
            Q3 = fivenum(hours_earned)[4],
            max = fivenum(hours_earned)[5],
            mean= mean(hours_earned),
            sd = sd(hours_earned))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race              term_year     n   min    Q1 median    Q3   max  mean     sd
##    <chr>             <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1 Am. Indian / AK … 2020          5     0     0    3       3     3  1.8   1.64 
##  2 Am. Indian / AK … 2021          1     3     3    3       3     3  3    NA    
##  3 Asian             2020         89     0     1    5       7    11  4.64  3.59 
##  4 Asian             2021         73     0     3    3       6    11  4.27  3.29 
##  5 Black / African … 2020        225     0     0    3       6    11  3.14  3.07 
##  6 Black / African … 2021        200     0     0    3       6    11  2.74  3.22 
##  7 Foreign           2020         78     0     0    3       6    11  3.71  3.38 
##  8 Foreign           2021         52     0     0    0       6    11  2.69  3.32 
##  9 Hawaiian / Pac. … 2020          2     0     0    0       0     0  0     0    
## 10 Hawaiian / Pac. … 2021          1     3     3    3       3     3  3    NA    
## 11 Hispanic          2020        379     0     0    3       6    11  3.37  3.21 
## 12 Hispanic          2021        290     0     0    3       6    11  3.69  3.19 
## 13 Multi-Race        2020         38     0     1    3       9    11  4.45  3.85 
## 14 Multi-Race        2021         38     0     0    3       6    10  3.34  3.16 
## 15 Unknown           2020          6     0     1    2.5     6     9  3.5   3.51 
## 16 Unknown           2021          2     3     3    3.5     4     4  3.5   0.707
## 17 White             2020        139     0     0    5       7    11  4.58  3.56 
## 18 White             2021        149     0     2    4       6    11  4.34  3.33

5.3 GPA

Boxplots of GPA by year by MCPS students 20yrs and younger

p11 = ggplot(df_MCPS20D, aes(mc_gpa))
p11 + geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~full_part)

Boxplots of GPA by year by Full time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(mc_gpa))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Boxplots of GPA by year by Part time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(mc_gpa))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Density plot of GPA by year

ggplot(df_MCPS20D, aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~full_part)+
  xlab("GPA") +
  ylab( "Density")+
  ggtitle(" GPA by Full-time vs Part-time Students")

GPA by full time students

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("GPA") +
  ylab( "Density")+
   ggtitle(" GPA of Full-time Students")

Fivenum Summary of Full time students

df_MCPS20D%>% filter(full_part=="FT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(mc_gpa)[1],
            Q1 = fivenum(mc_gpa)[2],
            median = fivenum(mc_gpa)[3],
            Q3 = fivenum(mc_gpa)[4],
            max = fivenum(mc_gpa)[5],
            mean= mean(mc_gpa),
            sd = sd(mc_gpa))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race              term_year     n   min    Q1 median    Q3   max  mean     sd
##    <chr>             <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1 Am. Indian / AK … 2020          4  2.35  2.62   3.2   3.75  4     3.19  0.717
##  2 Am. Indian / AK … 2021          1  2.77  2.77   2.77  2.77  2.77  2.77 NA    
##  3 Asian             2020        252  0     2.5    3.31  3.75  4     2.98  1.01 
##  4 Asian             2021        217  0     2.5    3.23  3.71  4     2.90  1.09 
##  5 Black / African … 2020        341  0     1.62   2.5   3.18  4     2.34  1.13 
##  6 Black / African … 2021        307  0     1.33   2.67  3.43  4     2.33  1.29 
##  7 Foreign           2020         98  0     2      3     3.69  4     2.66  1.26 
##  8 Foreign           2021         98  0     1.19   2.82  3.69  4     2.42  1.40 
##  9 Hawaiian / Pac. … 2020          4  0     1.12   2.46  3.22  3.77  2.17  1.58 
## 10 Hawaiian / Pac. … 2021          3  1.75  2.22   2.68  3.34  4     2.81  1.13 
## 11 Hispanic          2020        482  0     1.5    2.8   3.46  4     2.45  1.24 
## 12 Hispanic          2021        569  0     1.5    2.69  3.4   4     2.36  1.27 
## 13 Multi-Race        2020         66  0     2      2.75  3.5   4     2.64  1.09 
## 14 Multi-Race        2021         60  0     1.5    2.79  3.58  4     2.47  1.31 
## 15 Unknown           2020         10  0.33  2      2.46  3.4   4     2.57  1.05 
## 16 Unknown           2021          3  2.55  2.65   2.75  3.38  4     3.1   0.786
## 17 White             2020        238  0     1.8    3     3.63  4     2.62  1.23 
## 18 White             2021        239  0     2      3     3.70  4     2.68  1.25

GPA of Part time students

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours Earned") +
  ylab( "Density")+
   ggtitle(" GPA of Part-time Students")

Fivenum Summary of Part time students

df_MCPS20D%>% filter(full_part=="PT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(mc_gpa)[1],
            Q1 = fivenum(mc_gpa)[2],
            median = fivenum(mc_gpa)[3],
            Q3 = fivenum(mc_gpa)[4],
            max = fivenum(mc_gpa)[5],
            mean= mean(mc_gpa),
            sd = sd(mc_gpa))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race              term_year     n   min    Q1 median    Q3   max  mean     sd
##    <chr>             <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1 Am. Indian / AK … 2020          5     0  0     1      1.5      3  1.1   1.24 
##  2 Am. Indian / AK … 2021          1     2  2     2      2        2  2    NA    
##  3 Asian             2020         89     0  0.67  2.33   3.3      4  2.08  1.45 
##  4 Asian             2021         73     0  0.67  2      3.5      4  2.01  1.51 
##  5 Black / African … 2020        225     0  0     1.33   2.75     4  1.50  1.38 
##  6 Black / African … 2021        200     0  0     0.635  2.46     4  1.21  1.35 
##  7 Foreign           2020         78     0  0     2      3        4  1.78  1.49 
##  8 Foreign           2021         52     0  0     0      2.67     4  1.26  1.46 
##  9 Hawaiian / Pac. … 2020          2     0  0     0      0        0  0     0    
## 10 Hawaiian / Pac. … 2021          1     4  4     4      4        4  4    NA    
## 11 Hispanic          2020        379     0  0     1.5    3        4  1.63  1.45 
## 12 Hispanic          2021        290     0  0     1.58   3        4  1.64  1.43 
## 13 Multi-Race        2020         38     0  0.67  2      3.5      4  1.98  1.50 
## 14 Multi-Race        2021         38     0  0     2      3        4  1.69  1.53 
## 15 Unknown           2020          6     0  0.75  2.16   3.67     4  2.12  1.57 
## 16 Unknown           2021          2     3  3     3.5    4        4  3.5   0.707
## 17 White             2020        139     0  0     2.33   3.26     4  1.96  1.48 
## 18 White             2021        149     0  0.33  2.57   3.33     4  2.14  1.49

## Hours Earned Rate

Density plot of Hours Earned Rate by year

ggplot(df_MCPS20D, aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.3) +
  facet_wrap(~full_part)+
  xlab("Hours Earned Rate") +
  ylab( "Density")+
  xlim(0,1)

Boxplots of Hours Earned Rate of Full time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_earned_rate))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Boxplots of Hours Earned Rate of Part time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_earned_rate))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Hours Earned Rate of full time students

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("GPA") +
  ylab( "Density")+
   ggtitle(" Hours Earned Rate of Full-time Students")

Hours Earned Rate of part time students

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("GPA") +
  ylab( "Density")+
   ggtitle(" Hours Earned Rate of Part-time Students")

5.4 Distribution of Variables and Correlation

library(GGally)
# plot distributions and correlation of variables
df_MCPS20D%>% filter(term_year=="2020")%>%
              filter(full_part=="FT")%>%
              ggpairs(., columns = c("hours_attempted","hours_earned", "mc_gpa","hours_earned_rate"))

library(GGally)
# plot distributions and correlation of variables
df_MCPS20D%>% filter(term_year=="2021")%>%
              filter(full_part=="FT")%>%
              ggpairs(., columns = c("hours_attempted","hours_earned", "mc_gpa","hours_earned_rate"))

5.5 Distribution of Variables and Correlation

library(GGally)
# plot distributions and correlation of variables
df_MCPS20D%>% filter(term_year=="2020")%>%
              filter(full_part=="PT")%>%
              ggpairs(., columns = c("hours_attempted","hours_earned", "mc_gpa","hours_earned_rate"))

library(GGally)
# plot distributions and correlation of variables
df_MCPS20D%>% filter(term_year=="2021")%>%
              filter(full_part=="PT")%>%
              ggpairs(., columns = c("hours_attempted","hours_earned", "mc_gpa","hours_earned_rate"))